Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size
نویسندگان
چکیده
We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length of the vocal tract is compressed or expanded proportionally with the same cross-area function. The compressed and dilated versions of the impulse response can be converted into the same distribution using the Mellin transform. In this paper we show that the Mellin transform can be applied to the stabilised wavelet transform that forms the basis of the Auditory Image Model (AIM) of processing in the auditory pathway. The combined processing normalises source size information and produces a new, fruitful representation of source shape information, referred to as the “Mellin Image.” This “Stabilised Wavelet-Mellin Transform” (SWMT) also provides the mathematical framework for the derivation of the gammachirp auditory filterbank and the signal synchronous analysis in AIM.
منابع مشابه
Extracting Size and Shape Information of Sound Source in an Optimal Auditory Processing Model
We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...
متن کاملSegregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform
We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal t...
متن کاملSound resynthesis from Auditory Mellin Image using STRAIGHT
We propose an Auditory VOCODER to resynthesize sound from the Auditory Mellin Image which is an auditory representation that segregates the size and shape information of incoming sound. The sound resynthesis part consists of three techniques: the STRAIGHT VOCODER [2], frequency-warping cepstral analysis [4,12], and nonlinear multivariate regression analysis (MRA). We explain these methods and t...
متن کاملThe Perception of Scale in Speech
We can recognize vowel sounds regardless of whether a man, woman or child pronounces them. Such vowel normalization has proved to be a difficult task for computer models to simulate. Motivated by observations of the auditory system Irino and Patterson have discussed the stabilized wavelet Mellin transform as a candidate method for vowel normalization. The aim of this paper is to quantify and ex...
متن کاملAn Auditory Vocoder Resynthesis of Speech from an Auditory Mellin Representation
An auditory Mellin transform has been proposed to segregate information about the size and shape of the vocal tract automatically; the process is also independent of glottal pitch. In this paper, we describe a method for resynthesizing speech from the Mellin representation using a high quality vocoder (STRAIGHT), and a nonlinear function to map between the two representations of speech. This en...
متن کامل